Using Cox Proportional Hazards Model
April 15, 2025
What is it?
A statistical regression method specializing in modeling time-to-event predictions with survival data (Abeysekera and Sooriyarachchi 2009)
Is a method that can deal with censored data
Primarily used in the health field but has applications in predicting bank failure, the survival probability of machines, and insurance likelihood payouts
The model assumes that as time goes on, the survival probability will approach zero with no survivors (Asghar, Khalil, and Uddin 2024)
The proportional hazards assumption can limit the ability to correctly predict the effect of a variable (Jiang, Wu, and Li 2024)
The covariate selection can become biased and may not accurately represent the true data (Wang, Chang, and Lin 2025),(Zhang, Cheng, and Carrillo-Larco 2025)
The model cannot provide a specific value for when the event will happen, only the probability of when the event might happen
Survival Function: \(S(x) = \int_{x}^{\infty} f(x)dx\)
Hazard Function: \(h(x) = \lim\limits_{\Delta x \rightarrow 0} \frac{P[x \leq X < x + \Delta x|X \geq x]}{\Delta x}\)
CPH Model Hazard Function: \(h(t|\mathbf Z) = h_0(t)\text{exp}(\sum\limits_{k=1}^{p} \beta_kZ_k)\)
Proportional Hazards Ratio: \(\frac{h(t|\mathbf Z)}{h(t|\mathbf Z*)} = \text{exp}[\sum\limits_{k=1}^{p} \beta_k(Z_k - Z_k^*)]\)
Cumulative Hazard Function: \(H(x) = \int_0^x h(u) du\)
Concordance Index: \(C = \frac{c + \frac{t_x}{2}}{c + d + t_x}\)
Survival Probability: \(S(t) = e^{-H(t)}\), where \(H(t)\) is the above cumulative hazard function
There are four assumptions for CPH:
Independence assumption
Non-informative Censoring Assumption
Linearity Assumption
Proportional Hazards Assumption
Tested using the Martingale residuals for each covariate using the equation Martingale Residuals = Observed Events - Expected Results (Nahhas 2025)
If the plots are linear and appear to have a slope of zero, the assumption is not violated (Amini 2015)
Provides visualization for the Kaplan-Meier estimator
Is considered to be well defined up until the largest observed study time \(t_{max}\)
Demonstrates the time where the event being modeled is expected to occur
Provides visualization of the effects of each covariate on the hazard ratio
A positive effect indicates a positive correlation with the hazard ratio